以下の指標の中から、一つを選択して、データを WDI で取得し、以下の分析をする。
それぞれについて考察(気づいたこと、疑問など)を記す
2023.1.25. 23:59 までに Moodle の演習の課題ボックスに提出したものについては、なるべく、早く見て、フィードバックを書きます。それ以降に提出されたものも見ますが、フィードバックは遅くなると思ってください。
Government expenditure on education, total (% of GDP):SE.XPD.TOTL.GD.ZS [Link]
School enrollment, primary (% gross):SE.PRM.ENRR [Link]
School enrollment, secondary (% gross):SE.SEC.ENRR [Link]
School enrollment, tertiary (% gross):SE.TER.ENRR [Link]
Mortality rate, under-5 (per 1,000 live births):SH.DYN.MORT [Link]
School enrollment, primary and secondary (gross), gender parity index (GPI):SE.ENR.PRSC.FM.ZS [Link]
Ratio of female to male labor force participation rate (%) (modeled ILO estimate):SL.TLF.CACT.FM.ZS [Link]
Unemployment, female (% of female labor force) (modeled ILO estimate):SL.UEM.TOTL.FE.ZS [Link]
Unemployment, male (% of male labor force) (modeled ILO estimate):SL.UEM.TOTL.MA.ZS [Link]
Net official development assistance and official aid received (current US$) DT.ODA.ALLD.CD [Link]
概要:国内総生産(GDP)に対する、国の教育に関する支出(Government expenditure on education, total (% of GDP))のデータの分析を行う
Government expenditure on education, total (% of GDP):SE.XPD.TOTL.GD.ZS [Link]
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_ed_exp <- WDI(indicator = c(ed_exp = "SE.XPD.TOTL.GD.ZS"))
write_csv(df_ed_exp, "data/ed_exp.csv")
df_ed_exp <- read_csv("data/ed_exp.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, ed_exp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_ed_exp
str(df_ed_exp)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ ed_exp : num [1:16758] 3.91 4.63 4.35 4.54 4.74 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. ed_exp = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_ed_exp |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_ed_exp |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_ed_exp |> drop_na(ed_exp) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_ed_exp |> filter(country == "Japan") |>
drop_na(ed_exp) |> arrange(desc(year))
df_ed_exp |> filter(country == "Japan") |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろの急激な現象は、何が原因なのだろう。
2014年ごろから減少、2018年ごろから増加、2020年から2021年は減少。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot(aes(ed_exp)) + geom_histogram(binwidth = 1)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 3.416981
SAF <- df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(ed_exp)
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot() + geom_histogram(aes(ed_exp), binwidth = 1) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の教育費の対GDP百分率", subtitle = "日本:青、SACU:赤")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(ed_exp)) |> head(10) |>
ggplot(aes(fct_reorder(country, ed_exp), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(ed_exp) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, ed_exp)), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_primary <- WDI(indicator = c(primary = "SE.PRM.ENRR"))
write_csv(df_primary, "data/primary.csv")
df_primary <- read_csv("data/primary.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, primary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_primary
str(df_primary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ primary: num [1:16758] 105 105 106 105 104 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. primary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_primary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_primary |> drop_na(primary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_primary |> filter(country == "Japan") |>
drop_na(primary) |> arrange(desc(year))
df_primary |> filter(country == "Japan") |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line()
気づいたこと・疑問
df_primary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_primary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_primary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_primary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_primary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(primary) |>
ggplot(aes(primary)) + geom_histogram(binwidth = 5)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_primary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 102.73683
SAF <- df_primary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(primary)
df_primary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(primary) |>
ggplot() + geom_histogram(aes(primary), binwidth = 5) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の初等学校就学率", subtitle = "日本:青、SACU:赤")
df_primary |> filter(year == 2020) |> drop_na(primary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(primary)) |> head(10) |>
ggplot(aes(fct_reorder(country, primary), primary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "初等学校就学率")
df_primary |> filter(year == 2020) |> drop_na(primary) |>
filter(!(iso2c %in% REGION))|>
arrange(primary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, primary)), primary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "初等学校就学率")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_secondary <- WDI(indicator = c(secondary = "SE.SEC.ENRR"))
write_csv(df_secondary, "data/secondary.csv")
df_secondary <- read_csv("data/secondary.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, secondary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_secondary
str(df_secondary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ secondary: num [1:16758] NA NA 43.8 43.4 43.2 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. secondary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_secondary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_secondary |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_secondary |> drop_na(secondary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_secondary |> filter(country == "Japan") |>
drop_na(secondary) |> arrange(desc(year))
df_secondary |> filter(country == "Japan") |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line()
気づいたこと・疑問
df_secondary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_secondary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_secondary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_secondary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_secondary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(secondary) |>
ggplot(aes(secondary)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_secondary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 102.84480
SAF <- df_secondary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(secondary)
df_secondary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(secondary) |>
ggplot() + geom_histogram(aes(secondary), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の中等学校就学率", subtitle = "日本:青、SACU:赤")
df_secondary |> filter(year == 2020) |> drop_na(secondary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(secondary)) |> head(10) |>
ggplot(aes(fct_reorder(country, secondary), secondary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "secondary school enrollment")
df_secondary |> filter(year == 2020) |> drop_na(secondary) |>
filter(!(iso2c %in% REGION))|>
arrange(secondary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, secondary)), secondary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "secondary schooll enrollment")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_tertiary <- WDI(indicator = c(tertiary = "SE.TER.ENRR"))
write_csv(df_tertiary, "data/tertiary.csv")
df_tertiary <- read_csv("data/tertiary.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, tertiary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_tertiary
str(df_tertiary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ tertiary: num [1:16758] NA 8.85 9.23 8.81 8.9 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. tertiary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_tertiary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_tertiary |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_tertiary |> drop_na(tertiary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_tertiary |> filter(country == "Japan") |>
drop_na(tertiary) |> arrange(desc(year))
df_tertiary |> filter(country == "Japan") |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろからまた増加は、何が原因なのだろう。
どのように、中等学校後について定めているのだろう。
df_tertiary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_tertiary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_tertiary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_tertiary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_tertiary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(tertiary) |>
ggplot(aes(tertiary)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_tertiary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 62.13584
SAF <- df_tertiary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(tertiary)
df_tertiary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(tertiary) |>
ggplot() + geom_histogram(aes(tertiary), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の中等学校後の就学率", subtitle = "日本:青、SACU:赤")
df_tertiary |> filter(year == 2020) |> drop_na(tertiary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(tertiary)) |> head(10) |>
ggplot(aes(fct_reorder(country, tertiary), tertiary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "tertiary school enrollment")
df_tertiary |> filter(year == 2020) |> drop_na(tertiary) |>
filter(!(iso2c %in% REGION))|>
arrange(tertiary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, tertiary)), tertiary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "tertiary school enrollment")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_under5 <- WDI(indicator = c(under5 = "SH.DYN.MORT"))
write_csv(df_under5, "data/under5.csv")
df_under5 <- read_csv("data/under5.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, under5
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_under5
str(df_under5)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ under5 : num [1:16758] NA 57.3 59.1 60.9 62.9 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. under5 = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_under5 |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_under5 |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_under5 |> drop_na(under5) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_under5 |> filter(country == "Japan") |>
drop_na(under5) |> arrange(desc(year))
df_under5 |> filter(country == "Japan") |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line()
気づいたこと・疑問
継続的に減少している。
1960年ごろは40% ということは、1950年ごろは、50% ぐらいだったのだろうか。
df_under5 |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_under5 |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_under5 |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_under5 |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_under5 |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(under5) |>
ggplot(aes(under5)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_under5 |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 2.4
SAF <- df_under5 |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(under5)
df_under5 |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(under5) |>
ggplot() + geom_histogram(aes(under5), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "五歳未満の死亡率(1000人あたり)", subtitle = "日本:青、SACU:赤")
df_under5 |> filter(year == 2020) |> drop_na(under5) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(under5)) |> head(10) |>
ggplot(aes(fct_reorder(country, under5), under5)) + geom_col() +
coord_flip() + labs(title = "五歳未満の死亡率(1000人あたり)", x = "country")
df_under5 |> filter(year == 2020) |> drop_na(under5) |>
filter(!(iso2c %in% REGION))|>
arrange(under5) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, under5)), under5)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", y = "under 5 mortality", x = "country")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_school_gpi <- WDI(indicator = c(school_gpi = "SE.ENR.PRSC.FM.ZS"))
write_csv(df_school_gpi, "data/school_gpi.csv")
df_school_gpi <- read_csv("data/school_gpi.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, school_gpi
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_school_gpi
str(df_school_gpi)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ school_gpi: num [1:16758] NA NA 0.944 0.941 0.94 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. school_gpi = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_school_gpi |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_school_gpi |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_school_gpi |> drop_na(school_gpi) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_school_gpi |> filter(country == "Japan") |>
drop_na(school_gpi) |> arrange(desc(year))
df_school_gpi |> filter(country == "Japan") |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line()
気づいたこと・疑問
1995年ごろまでは揺らぎがある。そのあとは、下降。
差が小さいので、あまり、気にするのは適切ではないかもしれない。
df_school_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |>
drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_school_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_school_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_school_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、2020年のデータは少ないので、2019年について見てみる。
df_school_gpi |> filter(year == 2019) |> filter(!(country %in% REGION))|>
drop_na(school_gpi) |>
ggplot(aes(school_gpi)) + geom_histogram(binwidth = 0.02)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_school_gpi |> filter(year == 2019) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 1.00341 # no recent data after 2019
SAF <- df_school_gpi |> filter(year == 2019) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(school_gpi)
df_school_gpi |> filter(year == 2019) |> filter(!(country %in% REGION))|>
drop_na(school_gpi) |>
ggplot() + geom_histogram(aes(school_gpi), binwidth = 0.02) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2019年の初等中等学校就学率 GPI", subtitle = "日本:青、SACU:赤")
df_school_gpi |> filter(year == 2019) |> drop_na(school_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(school_gpi)) |> head(10) |>
ggplot(aes(fct_reorder(country, school_gpi), school_gpi)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "primary and secondary enrollment, GPI")
df_school_gpi |> filter(year == 2019) |> drop_na(school_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(school_gpi) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, school_gpi)), school_gpi)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "primary and secondary enrollment, GPI")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_job_gpi <- WDI(indicator = c(job_gpi = "SL.TLF.CACT.FM.ZS"))
write_csv(df_job_gpi, "data/job_gpi.csv")
df_job_gpi <- read_csv("data/job_gpi.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, job_gpi
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_job_gpi
str(df_job_gpi)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ job_gpi: num [1:16758] 87.5 87.2 86.7 86.9 86.6 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. job_gpi = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_job_gpi |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_job_gpi |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_job_gpi |> drop_na(job_gpi) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_job_gpi |> filter(country == "Japan") |>
drop_na(job_gpi) |> arrange(desc(year))
df_job_gpi |> filter(country == "Japan") |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line()
気づいたこと・疑問
2000年ごろからは上昇している。どんな政策変更があったのだろうか。
このまま、上昇すると、2040年ごろには、90を超え、100に近づく。それで、問題は解決したと言えるのだろうか。
df_job_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_job_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_job_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_job_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_job_gpi |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(job_gpi) |>
ggplot(aes(job_gpi)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_job_gpi |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 74.51027
SAF <- df_job_gpi |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(job_gpi)
df_job_gpi |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(job_gpi) |>
ggplot() + geom_histogram(aes(job_gpi), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "女性の就労率", subtitle = "日本:青、SACU:赤")
df_job_gpi |> filter(year == 2020) |> drop_na(job_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(job_gpi)) |> head(10) |>
ggplot(aes(fct_reorder(country, job_gpi), job_gpi)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "labor force participation rate of ")
df_job_gpi |> filter(year == 2020) |> drop_na(job_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(job_gpi) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, job_gpi)), job_gpi)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "labor force participation rate")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_female_unemploy <- WDI(indicator = c(female_unemploy = "SL.UEM.TOTL.FE.ZS"))
write_csv(df_female_unemploy, "data/female_unemploy.csv")
df_female_unemploy <- read_csv("data/female_unemploy.csv")
Rows: 16758 Columns: 5── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, female_unemploy
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_female_unemploy
str(df_female_unemploy)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ female_unemploy: num [1:16758] 8.51 8.5 8.12 7.62 7.42 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. female_unemploy = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_female_unemploy |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_female_unemploy |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_female_unemploy |> drop_na(female_unemploy) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_female_unemploy |> filter(country == "Japan") |>
drop_na(female_unemploy) |> arrange(desc(year))
df_female_unemploy |> filter(country == "Japan") |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line()
気づいたこと・疑問
df_female_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_female_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_female_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_female_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_female_unemploy |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(female_unemploy) |>
ggplot(aes(female_unemploy)) + geom_histogram(binwidth = 2)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_female_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 2.520
SAF <- df_female_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(female_unemploy)
df_female_unemploy |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(female_unemploy) |>
ggplot() + geom_histogram(aes(female_unemploy), binwidth = 2) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の女性の求職率", subtitle = "日本:青、SACU:赤")
df_female_unemploy |> filter(year == 2020) |> drop_na(female_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(female_unemploy)) |> head(10) |>
ggplot(aes(fct_reorder(country, female_unemploy), female_unemploy)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "unemployment rate, female, 2020")
df_female_unemploy |> filter(year == 2020) |> drop_na(female_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(female_unemploy) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, female_unemploy)), female_unemploy)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "unemployment rate, female, 2020")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_ed_exp <- WDI(indicator = c(ed_exp = "SE.XPD.TOTL.GD.ZS"))
write_csv(df_ed_exp, "data/ed_exp.csv")
df_ed_exp <- read_csv("data/ed_exp.csv")
df_ed_exp
str(df_ed_exp)
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_ed_exp |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_ed_exp |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_ed_exp |> drop_na(ed_exp) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_ed_exp |> filter(country == "Japan") |>
drop_na(ed_exp) |> arrange(desc(year))
df_ed_exp |> filter(country == "Japan") |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろの急激な現象は、何が原因なのだろう。
2014年ごろから減少、2018年ごろから増加、2020年から2021年は減少。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot(aes(ed_exp)) + geom_histogram(binwidth = 1)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 3.416981
SAF <- df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(ed_exp)
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot() + geom_histogram(aes(ed_exp), binwidth = 1) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の教育費の対GDP百分率", subtitle = "日本:青、SACU:赤")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(ed_exp)) |> head(10) |>
ggplot(aes(fct_reorder(country, ed_exp), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(ed_exp) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, ed_exp)), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_ed_exp <- WDI(indicator = c(ed_exp = "SE.XPD.TOTL.GD.ZS"))
write_csv(df_ed_exp, "data/ed_exp.csv")
df_ed_exp <- read_csv("data/ed_exp.csv")
df_ed_exp
str(df_ed_exp)
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_ed_exp |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_ed_exp |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_ed_exp |> drop_na(ed_exp) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_ed_exp |> filter(country == "Japan") |>
drop_na(ed_exp) |> arrange(desc(year))
df_ed_exp |> filter(country == "Japan") |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろの急激な現象は、何が原因なのだろう。
2014年ごろから減少、2018年ごろから増加、2020年から2021年は減少。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot(aes(ed_exp)) + geom_histogram(binwidth = 1)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 3.416981
SAF <- df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(ed_exp)
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot() + geom_histogram(aes(ed_exp), binwidth = 1) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の教育費の対GDP百分率", subtitle = "日本:青、SACU:赤")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(ed_exp)) |> head(10) |>
ggplot(aes(fct_reorder(country, ed_exp), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(ed_exp) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, ed_exp)), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")